\chapter{Introduction} 

\begin{center} ...a wealth of information creates a poverty of attention...\\
\textit{Herbert A. Simon}
\end{center}

\section{Problem Statement}

In this project, we are trying to extract keywords from a given news article. Intent of doing keywords extraction is to retrieve an relavant image for a given news article without annotated image. So the extracted keywords should closely match with descriptive metadata of an relevant image. After extracting keywords from text, keywords will be used for retrieving an image using \textit{Image Search System or Engine}.

\section{Motivation}

\section{Submodular Functions}

We are given a set of objects $V = \{v_1,...,v_n\}$ and a
function $F : 2^{V} \mapsto R$ that returns a real value for any subset $S \subseteq V$.

\cbox{Subset Function (F) }{\\$F : 2^{V} \mapsto R$}

In text summarization perspective, we are interested in finding subset of bounded size $|S| \leq K $ that maximizes the function F.

\cbox{Maximize the Subset Function }{
\\$S = argmax_{S \subseteq V} F(S) $
\\Subject to $|S| \leq K $.
}

Finding a subset that maximizes this function is hopelessly intractable. The submodular functions have wide applications in various domain including NLP such text summarization and word alignment.

\cbox{Example\\}{
F might correspond to the value or coverage of a set of sensor locations in an environment, and the goal is to find the best locations for a fixed number of sensors.}

if the function is monotone submodular, still the objective is NP-Complete. But there exist a greedy algorithm which will give the near optimal solution.

\section{Greedy Algorithm}

\cbox{Algorithm 1 : Greedy Algorithm\\}{
$G \leftarrow \emptyset$\\
$U \leftarrow V$\\
while $U \neq \emptyset$ do\\
\hspace{20 mm}$k \leftarrow argmax_{l \in U} 
\frac{f(G U \{l\}) - f(G)}{ (c_{l})^{r} }$\\
\hspace{20 mm}$G \leftarrow G U {k}$
if $ \sum_{i in G}{c_i} + c_k \leq B $ and
$f(G U \{k\}) - f(G) \geq 0$\\
\hspace{20 mm}$U = U \ {k}$\\
end while\\
$ v^{*} \leftarrow argmax_{v in V, c_v \leq B} f({v}) $\\
return $G_{f} = argmax_{S \in \{\{v^{*}\},G\}} f(S) $
}

\section{Proof of Near Optimal Solution}

\todo{update later}

\section{Contributions}

Contibutions towards this $R\&D$ project includes, 

\begin{enumerate}
\item {implementation submodular functions for text summarization using similarity score as TFxIDF.}
\item {experiments with semantic similarity measures instead of TFxIDF.}
\item {Using different clustering methods K-means and Single link to improve the diversity of summary sentences.}
\end{enumerate}



